Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compatible] Update Pinned PyTorch Nightly 1.13.0.dev20220801 #39

Open
wants to merge 4 commits into
base: pt-nightly-compatible
Choose a base branch
from

Conversation

aire-meta-bot
Copy link
Collaborator

This PR is auto-generated by AIRE Meta Bot.

  • Please check the CI results before merge it.
  • Please use rebase merge to keep pt-nightly-compatible branch against the main branch.
  • If one or more tests failed, please commit necessary changes to this branch (update-pinned-pytorch-nightly-1.13.0.dev20220801).

@comaniac
Copy link
Contributor

comaniac commented Aug 1, 2022

Diagnosis:

  1. Make kl_div a composite function. pytorch/pytorch#80334 removes kl_div op and make it composite.
  2. Replace all CHECK_ and DCHECK_ with TORCH_* macros pytorch/pytorch#82032 added TORCH_ to all CHECK* macros.

@comaniac
Copy link
Contributor

comaniac commented Aug 1, 2022

Somehow PyTorch upstream now defers the initialization of lazy tensor even for its shape. This results in failure in jit.script because we rely on the input shape (which is already on lazy device in the training loop) to convert PyTorch model to RAF.

While I don't have a clue about where to fix, I workaround this problem, I use meta device to make sure we can still get the input shape without copying the tensor back to CPU.

cc @hzfan @zachzzc @zhouyuan1119

@comaniac
Copy link
Contributor

comaniac commented Aug 1, 2022

Hmm the above solution seems not always working. This CI failed at test_image_model.py:test_compile_lenet_zero1. The stacktrace shows that

#3  0x00007fff6ef3680a in torch_lazy_tensors::Helpers::GetPromotedShape (shape1_dims=..., shape2_dims=...)
    at /home/ubuntu/torch_in_conda/ratex/ratex/lazy_tensor_core/csrc/helpers.cpp:235
235         LTC_CHECK(dim1 == dim2 || dim1 == 1 || dim2 == 1)

This is an add op, and dim1=3; dim2=0. Now I feel this is more like a bug...

@zachzzc please help take a look when you are available. Please note that this is against the PyTorch nightly version 20220801, and we don't really urgent to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants